13 research outputs found

    Building a High-Performance Collective Communication Library

    Get PDF
    We report on a project to develop a unified approach for building a library of collective communication operations that performs well on a cross-section of problems encountered in real applications. The target architecture is a two-dimensional mesh with worm-hole routing, but the techniques are more general. The approach differs from traditional library implementations in that we address the need for implementations that perform well for various sized vectors and grid dimensions, including non-power-of-two grids. We show how a general approach to hybrid algorithms yields performance across the entire range of vector lengths. Moreover, many scalable implementations of application libraries require collective communication within groups of nodes. Our approach yields the same kind of performance for group collective communication. Results from the Intel Paragon system are included

    Building a high-performance collective communication library

    Get PDF

    Design and implementation of MPI on Puma portals

    No full text
    As the successor to SUNMOS [a], the Puma operating system provides a jlexible, lightweight, high performance message passing environment for massively parallel computers. Message passing in Puma is accomplished through the use of a new mechanism known as aportal. Puma is currently running on the Intel Paragon and is being developed for the Intel TeraFLOPS machine. In this paper we discuss issues regarding the development of the Argonne National LaboratorylMississippi State University implementation of the Message Passing Interface standard on top of portals. Included is a description of the design and implementation for both MPI point-to-point and collective communications, and MPI-2 one-sided communications

    Experiences Implementing the MPI Standard on Sandia's Lightweight Kernels

    No full text
    This technical report describes some lessons learned from implementing the Message Passing Interface (MPI) standard, and some proposed extensions to MPI, at Sandia. The implementations were developed using Sandia-developed lightweight kernels running on the Intel Paragon and Intel TeraFLOPS platforms. The motivations for this research are discussed, and a detailed analysis of several implementation issues is presented. Acknowledgment Appreciation is extended to the following people for contributing to this research: Lee Ann Fisk, Tramm Hudson, Arthur B. Maccabe, Kevin McCurley, Lance Mumma, Rolf Riesen, Lance Shuler, David van Dresser, and Stephen Wheat. In addition, the authors would like to thank Jeff Brown, Pang Chen, and David Womble for useful discussions. Contents Introduction 6 Background 6 MPI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 MPI-2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..

    Disco

    No full text

    Building a High-Performance Collective Communication Library

    No full text
    In this paper, we report on a project to develop a unified approach for building a library of collective communication operations that performs well on a cross-section of problems encountered in real applications. The target architecture is a two-dimensional mesh with worm-hole routing, but the techniques are more general. The approach differs from traditional library implementations in that we address the need for implementations that perform well for various sized vectors and grid dimensions, including non-power-oftwo grids. We show how a general approach to hybrid algorithms yields performance across the entire range of vector lengths. Moreover, many scalable implementations of application libraries require collective communication within groups of nodes. Our approach yields the same kind of performance for group collective communication. Results from the Intel Paragon system are included. To obtain this library for Intel systems contact [email protected]. 1 Introduction The I..

    Interprocessor Collective Communication Library (InterCom)

    No full text
    In this paper, we outline a unified approach for building a library of collective communication operations that performs well on a cross-section of problems encountered in real applications. The target architecture is a two-dimensional mesh with worm-hole routing, but the techniques also apply to higher dimensional meshes and hypercubes. We stress a general approach, addressing the need for implementations that perform well for various sized vectors and grid dimensions, including non-power-of-two grids. This requires the development of general techniques for building hybrid algorithms. Finally, our approach also supports collective communication within a group of nodes, which is required by many scalable algorithms. Results from the Intel Paragon system are included. 1 Introduction The Interprocessor Collective Communication (InterCom) Project is a comprehensive study of tech- Copyright c fl1994 by the Institute of Electrical and Electronics Engineers, Inc. Reprinted with the permiss..
    corecore